Parameter Database : Data-centric Synchronization for Scalable Machine Learning

نویسندگان

  • Naman Goel
  • Divyakant Agrawal
  • Sanjay Chawla
  • Ahmed K. Elmagarmid
چکیده

We propose a new data-centric synchronization framework for carrying out of machine learning (ML) tasks in a distributed environment. Our framework exploits the iterative nature of ML algorithms and relaxes the application agnostic bulk synchronization parallel (BSP) paradigm that has previously been used for distributed machine learning. Data-centric synchronization complements function-centric synchronization based on using stale updates to increase the throughput of distributed ML computations. Experiments to validate our framework suggest that we can attain substantial improvement over BSP while guaranteeing sequential correctness of ML tasks.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Factorized Databases: Past and Future Past

In this talk I will overview the FDB project at Oxford on succinct, lossless representations of relational data that I call factorized databases. I will first present a characterization of the succinctness of results to conjunctive queries and how factorizations can speed up query processing.I will then comment on how this succinctness characterization relates to seemingly disparate results on:...

متن کامل

Access control in ultra-large-scale systems using a data-centric middleware

  The primary characteristic of an Ultra-Large-Scale (ULS) system is ultra-large size on any related dimension. A ULS system is generally considered as a system-of-systems with heterogeneous nodes and autonomous domains. As the size of a system-of-systems grows, and interoperability demand between sub-systems is increased, achieving more scalable and dynamic access control system becomes an im...

متن کامل

PivotalR: A Package for Machine Learning on Big Data

PivotalR [1] is an R package that provides a front-end to PostgreSQL [2] and all PostgreSQL-like databases such as Pivotal Inc.'s Greenplum Database (GPDB) [3], HAWQ [4] on Hadoop. PivotalR also provides the R wrapper for MADlib [5]. MADlib is an open-source library for scalable in-database analytics. It provides data-parallel implementations of mathematical, statistical and machine-learning al...

متن کامل

A Speech Driven Face Animation System Based on Machine Learning

Lip synchronization is the key issue in speech driven face animation system. In this paper, some clustering and machine learning methods are combined together to estimate face animation parameters from audio sequences and then apply the learning results to MPEG-4 based speech driven face animation system. Based on a large recorded audio-visual database, an unsupervised cluster algorithm is prop...

متن کامل

Database Establishment for Machine Learning in NILM

Nonintrusive load monitoring (NILM) is a problem of identifying operating appliances and estimating their energy consumptions based on whole home electric signals. Machine learning concepts and methods have been gradually applied to tackle NILM. A key factor of enabling and advancing machine learning methods in any problem is the availability of proper databases. The Reference Energy Disaggrega...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1508.00703  شماره 

صفحات  -

تاریخ انتشار 2015